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ABSTRACT 
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informational properties of the problem. Thii? is accomplished by introducing 
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I. INTRODUCTION 

In this part of the paper, the development of the thei ry of the fi lite - 
state, finite - memory (FSFM) stochastic control probl<im initiated in P art I 
[1] is continued. 

Specifically, the sufficiency of th>i FSFM minimum prin :iple (which is 
in general only a necessary condition) ir. investigated. By introducing 
notion of a signaling strategy as defined in the literature on games in 
extensive form [2], conditions under which the FSFM minimum principle is 
sufficient are determined. This result is interesting since it explicitly 
interconnects the information structure of the FSFM p obler with its 
optimality conditions. 

The paper closes with a discussion of the min-H algori".hm for the FSFM 
problem. It is demonstrated that a version of the algorithm always converges 
to a particular type of local minimum teimod a person - by ■ person extremal . 


II. SIGNALING AND SUFFICIENCY 


The notion of a signaling strategy arises in the the'.ry of Kuhn • 
type extensive geunes. According to Kuhn} an extensive gain is game 

tree with 

(i) a partition of the vertices with alternatives into the 

. ch^^'.ce moves euid player moves . . . , 

(ii) a partition of the moves of P^ into i nfoir ition sets 

(iii) a probeJoility distribution on the alternatives of the 
information sets of P^ 

(iv) an n- tuple of real numbers for each termin il ver*;ex. 

An example of a Kuhn-type extensive game is shewn in Figure 1. 

There is one chance nvcve in P^ with four alternatives. E ich alternative 
consists of the choice of an outcoine of tossing two penniis. Thus 
each outcome occurs with probeibility j. There are four mjves ii. P^, 
and player one’s information set is equal to P^. T’"”,s player one doe: 

not know the outcome of the first chance move, h- ' to guess if th< 
pennies match or don't match. If he guesses correctly, he gets to ke« > 
his own penny and player two's penny (the payoff is (•♦•1, -1) ) . If 
he guesses incorrectly, he loses his penny to player two (th.e payoff i; 
(-1, +D). 


1 


See (.3 J for a complete exposition. 
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(♦ 1 ,- 1 ) 

(- 1 .+ 1 ) 

(-l. + U 
(+I.-D 

(- 1 . + 1 ) 
(+ 1.-0 

t + 1,-1) 
(-I.+ 0 


Figure 1 MatL.uj.ng Pennies 


Every FSFM problem can be reduced to a Kuhn extensive qame. It might 


be thought that the reduction is accomplished by identifying the player’t. 
al ternatives with tlie controller's inputs, but this is not always 
possible . Suppose, for ex^U'plo, that ■ {1.2}, ■ (0,1), aid 

^1 “ ^^1' Yj(l) ■ 1. Yj(2) - 0 ^md Clearly, 

the g^une tree for this problem must have its first seven nodes as 
in Figure 2, with vertices 1 and 2 in the set of moves of player one 
{ the only player ). However, it is not possible to partition 

into information sets so that the restriction that the same alternative 
must be chosen for each ver- ex in a given information set .s equivalent 
to the restriction that the control law must lie in The point is 

that restricting the control laws to lie in an arbitrary subset of 
\-l 

is a more general restriction than one based on info mation. 

Thus, it is in general necessary to identify the player's alternatives 
with the set of control laws. This is undesirable since the game does 
not exhibit the information properties of the FSFM problem. However, 
it will be shown next that the first reduction (identifying 
alternatives with controller inputs) is possible for FSFM problems witli 
simple information constraint. 


The choice of Tj seems unnatural, but has appeared in the liter. iture 

[4j. The control laws in are the closed- loop control laws those 

' 

in open- loop control laws. 
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Figure 2 Game Tree for FSFM Problem 
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Definition 1 

The FSFM problem defined by equations (1) and (2) of ’art I is said to 
have a simple information constraint if 


Ft - (Vt = ft.l> (1) 


ror t ■ 1,2,...,T, where ■ P(U^) and i* « subfield of ■ P(X^_. ). 

The reason for restricting attention to FSFM oroblems with simple 
information constraints is liiat these problems can oc readily identified 
with a corresponding Kuhn mndel of an extensive game 

Suppose that a FSFM problem with simple information constraint is 

given. Let the sets X^, Q^t •••* have n^, n^^, m^^, n^# •••» 

elements, respectively. The rank 0 move^ of the corresponding game 

tree has n alternatives. For 1 < t < T, the rank 2t-l move has n. 

0 t 

alternatives and the rarU( 2t move has m^ alternatives. Thus every play 
has rank 2T + 1 (Figure 3) . 


A move is a vertex of the game tree with alternatives; a play is a 
(terminal) vertex without alternatives. The rank of a move or play is 
the number of moves that preceed it. See Kuhn [ 3] for details. 
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Figure 3 


Game Tree for FSFM Problem With 
Simple Information Constraint 



The chance moves are the moves with rank 1# 3, 

2T-1, and the moves of player 1 (the only player) are the moves 
with rank 2, 4, 2T, Each alternative of the initial (rank 0) 

move of the game tree corresponds to an elesient of X^. Similarly, the 
alternatives of mov3s with rank 2t-l correspond to elements of and 

stoves with rank 2t correspond to elements of U^. 

Each infonnation subset of P^ contains a single point of P^. The 
inforstation sets of P^ are defined by the atoms^ of as follows. Notice 
that the system equations of the FSFM problem define a map 

t Xq X X Uj^ * . . . X X 0^ -► (2) 

which takes an initial state and a sequence of inputs and gives 
corresponding statr. Each atom F of defines a set 

{(x(0), q(l), u(l), q(t>* u(t)): (x(0) , q(l) , u(l) , 

q(t), u(t)) e f) C Xq X X X ... X X U^. (3) 

Since there is a cne-to-one correspondence between the set X X X 

... X X and the moves of order 2t + 1 of the game, the partition 

induced on X. x Q, x U, x . . . x x by the atoms of F^ induces a 

Oil t t t 

partition on the corresponding set of moves. Thus each atom F C F^ gives 
rise to a single information set for player 1 containing moves or' player 
1. As a consequence, all the moves of given infonnation set aie o- the 

^An atom of a field F is a set F e F such that if E e ^ and EC F, then 

either E * (|) or E ■ F. The atoms of a finite field always exist i nd form 

a partition ( ^ • 


flame rank. Iliis is not surprising, since the protlem is sequential [6 ]. 

To finish the specification of the game, the probabilities of the 
chance moves must be defined emd the termiral cost specified. If an 
information set of contains a move of tank 2t-l, its alternative 
corresponding to q c is chosen witl f^<robability p^(q). The terminal 
cost is determined by the fact that the i lays are in one-to-one 
correspondence with x x x . . . x x U^. Thus each play determines 
a cor.plete state-control trajectory for which J can be evaluated. This 
value of J is the cost associated with the play. 

In game ti ry, a strategy for player 1 is the assignment of a single 
alter nativA to each information set. For PSFl^ problems with simple 
infomif <• . 01 . constraint, a control law is the assignment of a point in 
U to each atom of (since y is constrained to be F . mtisurable) . 

Because of the mam* er in which the information sets have been constructed 
above, there is clearly a one-to-one correspondence between the control 
laws of a FSFM problem with simple information constraint and the strategies 
of its corresponding ext ;nsive game form. Thus the same notation y will be 
used to describe either 1 control law sequence or a strategy for the 
equivalent extensive game. 

Since equivalence has been estai^lished between FSFM models with 
simple information constraint and Kuhn extensive game models, the notions 


of liignaling strategy and perfect recall can now be precisely defined. 

The following definitions and propositions are stated for 1-player geunes, 
but can be easily extended to n-person games. 

Definition 2 [3] . 

A move Z of player 1 (n*l) is called possible when playing y 
non-zero probedsility of occurring when the strategy y is used. An 
information set I for player 1 is called relevant when playing y if 
some Z c I is possible when playing y. 

Proposition 1 . 

A move Z for player 1 is p> nsible when playing y if and only if y 
chooses all alternatives on the path from the origin to Z idiich are 
incident at moves of player 1.^ 

P roof 

See reference [ 3 ] , page 201. 

Definition 3 [3] . 

A geune G is said to have perfect recall if I is relevant when playin<i 
y and Z e I implies that Z is possible when playing y for all I 
and y. 

Definition 4 [2] . 

Let I be an information set for player 1, ^md let ■ {moves following 
some move in I by alternative u}. Then I is a signaling^^nfoimation^^ 

^All chtnee move^ are assumed to occur with non-zero probability. 



for player 1 if, for some u and some information set J of player I, 

I u nj and J (2 1^. 

Proposition 2 [2] . 

A game G has perfect recall if and only if player 1 has no signaling 
information sets. 

Proof 

See reference [2], page 268. 

The following proposition is not valid for general games, but is a 
special property of l-per^on (stochastic control) problemc. 

Proposition 3 . 

Let G be a 1-person game with perfect recall, and let I be an 
arbitrary information set of the player. If I is not re^^vant %dien 
playing y, then the probability of any move in I is zero under y. If 
I is relevant when playing y, then the probability of any move in .T is 
positive under Y. Moreover, if I is relev 2 mt under any other strategy 
Y, then the probabilities of any move of I under Y and y are the same. 

Proof 

If I is not relevant when playing y, then by definition no move of 
I is possible when playing Y> Thus the probability of any such move is 
zero when Y is used. 

If I is relevant when playing Y, then every move of I is possible 
when playing Y since G has perfect recall. Thus the probability of any 
such move is positive when y is used. 



If Z c I is possible when playing y, by Proposition 3.3.1 y must 
choose all alternatives on the path from the origin to Z vrtiich are 
incident at moves of player 1. All other alternatives on are incident 
at chance moves, and the probability of Z under Y is simply the product 
of the probabilities of these alternatives. But this pn^ability is the 
same for y, since Y likewise chooses all altnnatives on the path W 

z 

ircident at moves of player 1. 

At this point, the preceeding definitions and propositions are applied 
tc- the FSFM problem. 

Definition 5 . 

A FSFM stochastic control problem is said to have perfect recall if 
it has a sin^}le information constraint emd the corresponding extensive 
game has perfect recall. 

Definition 6 . 

A control law Y^ a FSFM problem with simple information constra: nt 
it; said to be a signaling control law if an atom of gives rise to 

a signaling information set in the corresponding extensive game. 

Corollary 4 . 

A FSFM stochastic control problem with simple information constraint 
has perfect recall if and only if it has no signaling control laws. 

Proof 

This is a direct consequence of the definitions , the construction o 
the equivalent extensive game, and Proposition 2 , 
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Theorem 5 . 

Suppose that a FSFM stochastic control problem with perfect recall 
is given. Let A be an atom of then, for any control sequence, 

either the probability of all states in A is zero, or the probability of 
each state is a positive constant independent of Y* 

Proof 

By construction, the probability of a state x(t-l) e A under y is 
equal to the probedjility of the corresponding set of moves in the 
information set I generated by A. Therefore, the theorem follows 
immediately from Proposition 3. 

The property of FSFM problexns with perfect recall eiqjrsssed by Theorem 
5 makes it possible to strengthen the minimum principle to achieve a 
sufficient condition for optimality. 


Definition 7 . 

Let the set of itate pr^ability vectors reachable at time t, 

K t< T, when the initial state probability vector is be denoted 

Y Y Y 

rt(TTo) “ ^^0 ** ** ^^2) ••• P *^(t) : Yj^ t Y 2 e 

... , Y^ ^ } • 

is called the reachable set ■ {w^}). 

Definition 8 . 

Suppose that the control law sequence Y* ■ ^^1*' ^ 2 **’"' 
satisfies the condition 
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for all 


for t “ 
extremal 


> • y * 

TT(t-l) P ^ (t) ^(t) ♦ TT(t-l) h ^ (t) 

y y 

<nt-l) P ^(t) 4>*(t) + TT(t-l) h ^(t) 

e r^, for all ir(t-l) c r^_^^ ( ti^) vrhere 

Y * Y * 

^(t-1) - P ^ (t) (|)*{t) ♦ h ^ (t) 

1,2,...,T (<ti*(T) “ 4»^) . *nien y* is said to be universally 


(5) 


6) 


Lenina 6 . 

Any universally extremal control law sequence is optimal. 
Proof 

The proof proceeds by induction on the nuirber of stages T. 
Suppose T * 1. Then 

Y, 

J(Y^) * 1T(0) h ^(1) + Tfd) (^(1) 

y y 

- TT(0) h ^(1) + 7T(0) P ^(1) 4»(1) 


so that any extremal is optimal. 

Suppose the lemma is valid for problems with T-1 stages. It must bo 
established that the lemma is valid for problems with T staaes. 

Assume that (Yj^*» ^2*' ^T*^ universally extremal. It 

follows immediately that {y^* > Y^** •••» Y^*) is universally extremal for 
Uie problem with cost 




T 


f • • • # 


Y^; tt(D) 


(8) 
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for any ^(1) C Therefore, by the induction hypothesis, 

J(Y2*» ...» \*f iTd) < JCYj' •••» 

for all TT(1) C ^or all Y 2 ^ ^ 2 ' *’** *■ '^T* **®>f*®''**^ » ®^ce 

Y Y 

J(Y^» Y2 \) - ^(0) h ^(1) ♦ My^ Xgt ^0) P ^d)) 

( 10 ) 

it follows that 


for all Yj e Y 2 e T^, ..., Y^ e T^. 

But the asBuiaption that (Y^^*» Y 2 *» ...» Y^*) is universally extremal, 
iinplies that 


Y • Y * 

J(Yi*» Y2* Y/) - fl(0) h ^ (1) ♦ 7T(0) P ^ (1) (J)*d) 

Y Y 

< TT(0) h ^d) + n(0) p ^(1) ♦*(!) - J(Y^, Y2*» •••» Y^*> 


( 12 ) 

for all Yj^ ^ ^ 1 * lemna follows from (12) and (11) . 

Notice from the proof of Lenma ^ that the existence of a universally 
extremal control law sequence Y* implies the unusual fart that the 



for Yj^ e ..., e ^t-1 “ common solution (Y^*» ...» Y,j,*) • 

Thus the existence of a universal extremal would seem to be rather uili)cely. 



From this viewpoints the following property of FSPM problems with perfect 
recall seems rather remarkable. 

Theorem 7. 

Every FSFM problem with perfect recall has a universally extremal 
control law sequence. 

Proof 

The proof is constructive. The control laws are defined by 

choosing their values on the atoms of 

Consider the case for t«T. Let be an atom of ^^-i' ^ “ 1,2,..., 

P. For simplicity of notation, suppose that contains the first 

2 

states of contains states + 1 through of etc. 

Notice that 

Y Y 

ir(T-l) P ^(T) 4>(T) + n(T-l) h ^(T) (14 

J, 

P i r n u (T) u (T 

» I J: tt (T-1) P.,, ^ (T) (k (T) ♦ h. (T) 

i-1 ^ L ^ ^ 

where n is the number of states in is the value of 

Y^ on the ith atom of P , . 

T T-1 

The decomposition ( 14 ) makes the constr^'ction of Y,p* clear. 

By Proposition 5, every vector Tr(T-l) e r^_j(iTQ) either has tt^(T- 1)*0, 
j ■ 1, ..., bas TTj 'T-1) » ft^(T-l) , j » 1, ..., 

A* 

H where each Ti. (T-1) is a fixed number independent of Y, , . . . ,Y- ,• 

1+1 i 1 T-1 

* * _ 

Therefore, Y„ takes the value u, (T) on the ith atom of r„ , , 

T i T-1 


where 
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n 

min I Ti^ (T-1) I E P^,j“(T) 4^(T) ♦ h 


u e Ut J- 


r n u.* 
E n (T-l) I J P.v ^ 

^ [k-l 


(T) 


(T) 4^(T) ♦ h 


>“<«] 

U /(T) 1 

j (T,J . 


(15) 


The construction of the remaining i* oos^letad by applying an 
analogous procedure to 


Y Y 

TT{t-l) P ^(t) ♦•(t) ♦ n(t-l) h ^(t). 


(16) 


Theorem 7 is primeurily of theoretical and conceptual impoitance. 
Problems with perfect recall are more efficiently handled by < eriving an 

equivalent deterministic problem that has a conditional probal ility vector 
for the deterministic state. (The conditioning is with respect to the 
field ) Special cases of this procedure are inplicit in the usual 

stochastic dynamic programming algorithm [7, 8, 9, ] 2 uid the 
algorithm of Sandell and Athans for the 1-step delay problem 110.] . 



III. A FSFK MIN-H ALGORITHM 


A substantial number of nrmerical algorlthma have bean suggested 
for the solution of deterministic optimal control problems. Itie moct 
natural of these for the FSFM problem is the min-H algorithm, which is 
intimately related to the minimum principle. The nu.n-H algorithm wat 
initially suggested by Kelley ( 11 J . Platzman [ 12] hits shown that the 
algorithm is equivalent to Howard's jolicy iteration method for Markovian 
decision processes, and has suggested its application to the inperfert 
•tate information case ol that problem. 

To simplify the notation, the sets and are assvoned to have 

a constant cardinality for 0 < t < T. 

Algorithm (Mjn-H) 

0 0 0 

1. Guess Yj . Y 2 * •••* Y,p • Set j - 0. 

2. Compute (T) , (Ji^(T-l), ..., using , ..., y^^ in the 

adjoint equation (4>^(T) * ^ ). Set t “ 1. 

Y Y 

3. Choose Y^^^^ to minimize TT^^^(t-l) P ^(t) <})^(t) + TT^*^^(t-l) h ^(t).^ 

- 'o> ^ j*l 

4. If t < T, compute HT^^^(t) ■ ^^^^(t-l) P ^ (t) . 

Set t » t+1, and go to 3. 

5. If t - T, test where 

T Y^^^ 

«= Z TT^(t-l) h ^ It) + 71^ (T) (t„.. 

a ._1 ^ 


If Y^^ ^ is not unique, choose arbitrarily but with preference for 
if it is in the minimizing set. 
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if 


If 

If 

< J^, set j ■ j+1, t ■ 0, and go to 2. 
■ J ^ , stop. 

Ttieorem ®* 



The preceeding algorithm converges in a finite number of steps 
to an extremal solution. 

Proof 

Let S ■ { J(Y)|Y£^r}. Since S is finite, its elements cw be arrange! in 
d< scending order, 


S ' 




J. 

1 



Consider the set of positive numbers 


(17) 


R * 





(18) 


and let £ “ inf R. Note that E > 0. 

Consider the difference defined in the algorithm. Clearly, either 

« 0, or i e. By induction, if the algorithm has not 

converged by step j, then 




< J 


0 


- je- 


(19) 



. 1 


Therefore, eventually • j\ since inf • is finite. But 

implies that |y^**«**Y^ | is extremal. 

Although the F.SFM Min-H algorithm is guaranteed to conveige in a f init » 
number of steps, the amount of computation per step may be prchibitive, eve) .. 
if full adveuitage of the special structure of the problem is n ide (see [ 13 ] for 
a discussion cmd estimates of computation time) . Thus modifies :ions to the 
basic algorithm for special cases are of interest. 

Consider the case in which 


X r* X. . 


(2o; 


and consists of control 


t 


laws measuTcible with respect to a subfield 


Make the following notatlonal 


\ '■ \-i * “t' » “t’ * • 


\ * •••» Y^’^) 


convention : 

. X 


■nien 


^1 ' ^2 ' * * * ' 

_ T/v 1 V ^ 1 

i •••# 0 J2 f •••0 ^2 • •••0 Yj , 0 


( 21 ) 

( 22 ) 



( 23 ) 
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Definition 


A sequence 


Y* - (Y,*. .... Y^*) - (Y. 


1 * 


H* 


!• 


9 ••• 91 % 9 • • • 9 


• . . Y.J, ) 


is said to be a person-by-person extresial if 


J(Y Y Y 

9 • • • 9 9 • • • 9 T||| / 


< J(Y^^‘. .... Y^^ for all e 


i* 1^ k| t* If •••/ T* 


(24) 


Every optimal control law sequence is a person - by - person extremal* 
but the converse need not be true. Clearly, the FSFM Min-H algorithm can be 
modified to give an algorithm that always converges to a person - by-person 
extremal. One possible order of minimization is 





. . . . 



Thus k forward and backward sweeps cf the st&te and costate equations are 
required per iteration. The number o' multiplications required is considerably 
reduced. See (13J for details. Clearly, the person -by- person Min-H algorithm 
is finitely convergent to a person -by- person extremal solutlcn. 



Notice that person-by-person approach is consistent with the minimun 
principle approach: 

1. both approaches gxven .necessary conditions for optimality 

2. both approaches are sufficient only under convexity assusptions 
that do not hold in general 

3. An initial guess is improved, but the improvement may stop 
short of optimal. 

These facts are consequences of the fact that the person-by-person and 
min H algorithms are actually both concrete realizations of orthogonal 
search. The Min-H algorithm minimized the coat without coordinated choice 
of the control laws at different times. The person - by-person Min-H 
algorithm minimizes the cost without coordinated choice of the control 


laws of the various controllers at a fixed time instant. 
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IV. SUMMAHY AND CONCLUSIONS 

The notion of signaling has been introduced from game theory and 
shown to be relevant to the FSFK (problem. In fact, the sigmiling 
phenomena is of general importance in non-cJassical stochastic control 
theory. The presence of signaling makes it necessary for decentralized 
controllers to employ < ontrol laws with a dual purpose ; simultaneous 
C'«nmunication and control. The presence of signaling in LQG problems 
mc.nifests itself in the nonlinear strategies that are optimal for these 
problems [1,14]. (Given the prevelance of nonlinear coding and mcxiulation 
techniques in coir municat ion theory, the existence of nonline ir optimal 
sf'ategies for nonclassical LQG problems is hardly surprising.) Moreover, 
the absence of signaling in LQG j^roblems (in the LQG context, equivalent 
to the presence of Ho*~Chu nesting) insures the optimality of linear 
strategies [15]. Thus the very special nature of the classical stochastii 
control problem is made clear: only the control aspect of th‘3 dual problc.ns 
of communication and contr I need be considered. 

The need to simultaneously solve a control and conminicat ion' problem 
makes the nonclassical stochastic control problem very difficult to solve, 
even in the FSFM case. One approach to solution of the FSFM i'roblem is the 
person - by - person "'in-H algorithm sketched in Section III. Presently, 
evaluation of the algorithm is being carried out in the contt xt of a highly 
simplified model of an ARPA-type packet switching computer ccmmunication 
network [161. The primary difficulty is essentially combinatorial, since there 
is an explosive growth in the number of states with network size. Thus 
straightforward implementation of an algorithm seekino "noda-by-node” optimal 
routing strategy is possible only for small networks, or larger networks with 
an aggregated and/or merged [17] state set. 
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